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Structure of a DBMS 


¢ A typical DBMS has a 
layered architecture. 


¢ The figure does not show the 
concurrency control and 
recovery components. 


¢ This is one of several 
possible architectures; each 
system has its own 
variations. 
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Structure of a DBMS —— 
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User 1 User 2 User 3 User 1 User 2 User 3 
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Query Optimization 
and Execution Concurrency Control ensures 
Balai Te’ t correct/fast data access 
a when DB is used by many 
Files and Access Methods mats 


Buffer Management 
; Recovery ensures that DB is 
Disk Space Management fauiereionant 
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Summary omy | QSaes 
¢ Transactions are important: 
¢ Concurrency 
¢ Recovery 
¢ Xact Properties 


ACID Properties 


Summary 

¢ Transactions are important: 
¢ Concurrency 
¢ Recovery 

¢ Xact Fropentiee 


» Atomicity — 


“Each transaction is “all or nothing” 


Durability 


Committed data would not be lost, even after power failure. 


Approach #1: Logging Approach #2: Shadow Paging 
DBMS logs all actions so that itcan undo the The DBMS makes copies of pages modified by 
actions of aborted transactions. Most the transactions and transactions Mas 


common approach. changes to those copies. 


Other possible solutions 


No Steal Steal 
Simplest to implement aan ) 
Least Efficient 


. No UNDO UNDO 
Orc€ NoREDO § No REDO 


No Force No UNDO ~~ UNDO 


REDO ———— Hardest to implement 


Most Efficient 
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¢ For every update, record information to allow REDO and UNDO 
information in a /og. 


¢ Sequential writes to log (put it on a separate disk). 
¢ Minimal info (diff) written to log, so multiple updates fit in a single log page. 
° |og: An ordered list of log records to allow REDO/UNDO actions 
¢ Log record contains: 
<XID, pagelD, offset, length, old data, new data> 
¢ and additional control info (which we'll see soon). 


: : SS 
Write-Ah Logging (WAL sree | QB avers 
U NIV E R 5 1T Y COMPUTER SCIENCE 


¢ The Write-Ahead Logging Protocol 


¢ 1. Must force the !og record for an update be/ore the corresponding data 
oage gets to disk. 


¢ 2. Must write all log records for a Xact before commit. 
¢ #1 (with UNDO information) guarantees Atomicity. 
¢ #2 (with REDO information) guarantees Durability. 
¢ WAL allows us to implement STEAL/No-Force. 


¢ Exactly how is logging (and recovery!) done? 
¢ We'll study the ARIES algorithms. 
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LSNs pageLSNs _ flushedLSN 


¢ Log: ordered file with a write buffer 
(tail) in memory 


¢ Each log record has a Log Sequence 
Number (LSN). 


¢ LSNs are unique/increasing. 


Log records flushed to disk ,, 
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LSNs pageLSNs _ flushedLSN 


¢ Log: ordered file with a write buffer 
(tail) in memory 
¢ Each log record has a Log Sequence, 
Number (LSN). 
¢ LSNs are unique/increasing. 
¢ System keeps track of flushedLSN. 


¢ The max LSN flushed so far. RAM 
DISK 


<n ONG 


time > 


Log records flushed to disk ,, 
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LSNs pageLSNs _ flushedLSN 


¢ Log: ordered file with a write buffer 
(tail) in memory 


¢ Each log record has a Log Sequence, 
Number (LSN). 
¢ LSNs are unique/increasing. 
¢ System keeps track of flushedLSN. 
¢ The max LSN flushed so far. 


¢ Each daia page contains a 
pageLSN. 


¢ A pointer to the log 
¢ The LSN of the most recent /og record a 


aS 


for an update to that page. pageLSN 2 
time 


Log records flushed to disk ,; 
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LSNs pageLSNs _ flushedLSN 


° WAL: Before a DB page | is 


caeenen — 
Buffer Pool 
¢ Check that pageLSN, <flusnedLSN f 


(1.e., log record has already been 
flushed) 


time > 


Log records flushed to disk ,, 
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WAL & the Log 
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° WAL: Before a DB page | is 


caeenen — 
Buffer Pool 
¢ Check that pageLSN, < flushedLSN | 


(1.e., log record has already been 
flushed) 


- It is now possible to steal page | 
from disk 


Log records flushed to disk ,; 


WAL & the Log Gi DB omy Qaes 
pageLSNs  flushedLSN 
° WAL: Before a DB page | is 
written to disk, 
¢ Check that pageLSN, < flushedLSN 


(1.e., log record has already been 
flushed) 
- It is now possible to steal page | 
from disk 


¢ But if hot, it can be kept in memory 
and flushed to disk later 


Log records flushed to disk ,, 


Exactly how is logging (and recovery!) doné? 


¢ UNDO Logging (Force/Steal) 
¢ REDO Logging 
¢ ARIES 
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¢ Our goal is to be able to rollback individual transactions 
¢ We need to connect all the log records for a given transaction 
with id XID 
¢ Add a field called prevLSN in each log record 
¢ prevLSN is the LSN of the previous log record corresponding to XID 
¢ Records of transaction form a linked list backwards in time 


Log Records 
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LogRecord fields: 
LSN 
prevLSN 
XID 
type 
pagelD 

update length 
records offset 
only 


before-image 
after-image 
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Log Records sonny | QB ames 
| LogRecord fields: 
¢ Possible log record types: 
¢ Update, Commit, Abort 
¢ Checkpoint 
¢ Compensation Log Records End (CLRs) 
¢ For UNDO actions 
update 


¢ End (end of commit or abort) vecords 
only 
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Log Records 


¢ Update records contain sufficient 
information for UNDO and REDO 


¢ Diff as in the example 


¢ Or could use more space-efficient 
encodings 


LogRecord fields: 


update 
records 
only 
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¢ Two in-memory tables: 
¢ Transaction Table: 

¢ One entry per active Xact. 

¢ Contains X!D, status (running/commited/aborted), and lasiLSN. 
¢ Dirty Page Table: 

¢ One entry per dirty page in buffer pool. 


¢ Contains recLSN -- the LSN of the log record which first caused the page 
to be dirty. 
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ARIES Overview 


a “~ 
ia ce Meaaben ae 
¢ Data Pages - Log Records * Xact Table 
- Each has a pageLSN > LSN Sen 
¢ prevLSN = yet 
¢ Master record - Status 
¢ XID 
oe » Dirty Pages Tabl 
¢ pagelD Irty ho able 
° pi 
° Length aa 
¢ Offset 
¢ before-image - Log Tail 
* after-image - flushedLSN 


Buffer Pool 
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¢ Periodically, the DBMS creates a checkpoint, in order to minimize 
the time taken to recover in the event of a system crash. 


24 


po ee 
| | SSS 
boss 
MOHAMMED VI UM6P-CS 
POLYTECHNIC =+j|,| Cg iM SCHOOL OF 
UNIVERSITY |  “Smse™ COMPUTER SCIENCE 


¢ Problem: database does not run any new transactions during 
checkpointing 
¢ Solution: fuzzy checkpointing 
¢ Save state of all Xacts and the status of each page 


¢ Some Xacts can be running and dirty pages not flushed yet 
¢ Need new data structures to store this information 
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¢ Keep track of: 

¢ Xact status (running, committing, etc.) 

¢ Dirty pages and which Xact’s action first caused page to be dirty 
¢ Save 1 and 2 to disk during checkpointing 
¢ At recovery: 

¢ Re-create 1 and 2 from the log 

¢ Re-create running Xacts and dirty pages in memory 

¢ Replay rest of the log 
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¢ Each log record has a Log Sequence Number (LSN) 
¢ Each data page has a pageLSN 
¢ The LSN of the most recent LSN that updated this page 
¢ Dirty pages Table (recLSN) 
¢ Xacts Table (lastLSN) 
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tsw [previsn| xD [pagelD | LogPayiosd 
20s |= i200, = START 
206 [= rao) =) STR 


¢ WAL: 
¢ We store both the old and new values in an update record 
¢ prevLSN = LSN of the previous log record written by this Xact 
¢ Actions of a transactions form a linked list backwards in time 


Fuzzy Checkpointing -- Example ope | (Baers 
Log (WAL) 


risn [previsn| XID | pagelD | Log Payload — 


r206| = [tan [ =| starr 


Dirty pages Table (DPT) 
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Transactions Buffer Pool 

| XID | lastisn | Status | [p2ipagetsn=209)[ 5] 
Po [Pete aa) 
es ee ee 
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Dirty pages Table (DPT). Write a <BEGIN CHKPT> to log 


- Flush Log to disk 
¢ Continue transactions 


¢ Write DPT and Xacts tables to disk, and write 
<END CHKPT> to disk 


¢ Flush log to disk 


Transactions 


ji | tests | status 
; 
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¢ What to do when a Xact: 
¢ Starts 
¢ Updates a page 
¢ Commits 
¢ Aborts 
¢ What to do when Buffer Manager: 
¢ Reads a page from disk 
¢ Writes a page to disk 
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Log (WAL) 


: eee a) 
- Xact starts: 
¢ Write START record in log 206] - | 1203 | - | START _ 
- Update Xacts table 
¢ Example: T1204 starts 2io[ = | 204 | = | START 


¢ Write <START,1T204> in Log 


¢ Add 1204 in Xacts table and 
set lastLSN to 210 


¢ DPT and BF remain 
unchanged Transactions | XID lastLSN | Status 
7203 


¢ Xact updates: 
¢ Write update record in Log 
¢ Update the LSN pointers: 
¢ prevLSN = lastLSN 
¢ pageLSN =LSN 
¢ lastLSN =LSN 
¢ recLSN = if null then LSN 


¢ Example: T1203 writes 35 in 
P2 


¢ Write <T203,P2,25,35> in 
Log 


¢ New LSN: 211 
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¢ Xact updates: 
¢ Write update record in Log 
¢ Update the LSN pointers: 
¢ prevLSN = lastLSN 
¢ pageLSN =LSN 
¢ lastLSN =LSN D 


ir 
- recLSN = if null then LSN pagelD lastLSN | Status — 
¢ Example: T203 writes 35 in D9 509 
P2 210 | running” 
210 runnin 
¢ Write <1203,P2,25,35> in . 
= 
- NewLSN: 211 ila eh 


¢ Update Other tables ions P7 


- - Buffer Pool 


¢ Page is flushed to disk 


¢ Flush log up to (and incl.) 
pageLSN 


¢ Remove page from DPT and 
BF 


¢ Example: BP flushes P2 
¢ Flush Log up to (and incl.) 
211 


207 


page 


P2 (pageLSN = 211) 


rTastisn | status 


P7 


Buffer Pool 


¢ Page is flushed to disk 


¢ Flush log up to (and incl.) 207 
pageLSN 


¢ Remove page from DPT and 


¢ Example: BP flushes P2 Dirty pages Table (DET 
- Flush Log up to (and incl.) nace lastlSN | Status 


¢ Remove P2 from DPT and 
BP 


- Buffer Pool 


ARIES Normal Operation 


¢ Transaction commits 


Write commit record to Log 
Flush Log up to this entry 
¢« Xact is now committed 


Update Xacts table to 
commit 


Eventually flush dirty pages 
to disk 


Write end record to Log 


Update Xacts table to 
complete 
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¢ Transaction commits 
¢ Example: T203 commits 


Write <Commit, T203> 

Flush Log up up to this entry 
Update Xacts to commit 
Eventually flush dirty pages to Dirty pages Table (DPT Transactions 


Write <END,T100> in Log aaa 
i ee 


= | * 13 


- Buffer Pool 


¢ Transaction commits 
¢ Example: T203 commits 


Write <Commit, T203> 
Flush Log up up to this entry 


Update Xacts to commit 
Dirty pages Table (DPT Transactions 


ir 

| astisw | status 
x 
p 2 } es 
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Eventually flush dirty pages to 
disk 
Write <END,T100> in Log 


Update Xacts table to 
complete 


- - Buffer Pool 


ARIES Normal Operation 


¢ Transaction aborts 


Write abord record to Log 


Find first action to undo from 
lastLSN 


Go to Log and start undo changes 


Write compensation record (CLR) 
to Log 


¢ CLR ‘s undoNextLSN points to next 
record to undo (part of payload) 


Follow prevLSN to retrace more 
actions to undo (if any) 


Once done, change Xacts to abort 
Eventually flush dirty pages to disk 
Write end record to log 

Update Xacts to complete 
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So 
Se 
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¢ Transaction aborts 


¢ Example: T203 aborts 


Write <ABORT,T203>toLog | 206| - | _ 1203 = or 
From lastLSN, LSN 211 is first T203 <25,35> 
to undo 


ya T203 ABORT 
etnies Lee 2 || = 
Follow prevLSN until no more )213 | 212 aa a a 


actions to undo Dirty pages Table (DPT Transactions 


[xi [ Tastisn | status 
r- [| -[- 


- - Buffer Pool 


¢ Transaction aborts 


¢ Example: T203 aborts 


Write <ABORT,T203>toLog | 206| - | _ 1203 = or 
From lastLSN, LSN 211 is first T203 <25,35> 
to undo 


ya T203 ABORT 
ietendwtette Lee, 22 | oe = 
Follow prevLSN until no more i= aa a a 


actions to undo Transactions 


Change Xact status to abort 


: : Buffer Pool 


¢ Transaction aborts 


¢ Example: T203 aborts 
¢ Write <ABORT,T203> to Log 


¢ From lastLSN, LSN 211 is first 
to undo 


¢ Undo 211 and write <CLR> 


¢ Follow prevLSN until no more 
actions to undo 


¢ Change Xact status to abort 
¢ Eventually flush dirty pages to 
disk 


¢ Write <End,T203> to Log 


¢ Update the Xact status to 
complete 


rao | |) START 


ria | mt [Taos [=| anor 
faa [m3 [ aos [= [eNO 


Transactions 


XID | TastiSN | Status 
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Crash Recovery: Big Picture epten | QBurec 
Oldest log 
nee Ora Oo Start from a checkpoint (found 
active at crash ; 
via master record). 
nee o Three phases. Need to: 
dirty p age - Figure out which Xacts committed since 
table after checkpoint, which failed (Analysis). 
Analysis — REDO allactions. 
o (repeat history) 
- UNDO effects of failed Xacts. 
Last chkpt 
CRASH | 
A R U 
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¢ Reconstruct state at checkpoint. 
¢ Via Xact table and DPT stored in the end_checkpoint record 
¢ Scan log forward from checkpoint. 
e End record: Remove Xact from Xact table. 
¢ Update record: If P not in Dirty Page Table, 
¢ Add P to D.P.T., set its recLSN=LSN. 


¢ Other records: Add Xact to Xact table, set |astLSN=LSN, change Xact 
status on commit. 
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¢ We repeat History to reconstruct state at crash: 
¢ Reapply a// updates (even of aborted Xacts!), redo CLRs. 


¢ Scan forward from log rec containing smallest recLSN in D.P.T. 
For each CLR or update log rec LSN, REDO the action unless: 


¢ Affected page is not in the Dirty Page Table, or 
¢ Affected page is in D.P.T., but has recLSN > LSN, or 
¢ pageLSN (in DB) >LSN. 
¢ To REDO an action: 
¢ Reapply logged action. 
¢ Set pageLSN to LSN. No additional logging! 
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ToUndo={lastLSNs of Xacts in the Xact table} 


Repeat: 
¢ Choose largest LSN among ToUndo. 
¢ If this LSN is a CLR and undonextLSN==NULL 
e Write an End record for this Xact. 
¢ If this LSN is a CLR, and undonextLSN != NULL 
¢ Add undonextLSN to ToUndo 


¢ Else this LSN is an update. Undo the update, write a CLR, add prevLSN 
to ToUndo. 


Until ToUndo is empty. 
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LSN LOG 
RAM O00 begin_checkpoint 

05 end_checkpoint 
Xact Table 10 update: T1 writes P5 

20 -> update T2 writes P3 
Dirty Page Table a a eae 

40 CLR: Undo}T1 LSN 10 
flushedLSN 45 -- T1 End 

50 -- update: T3 writes P1 
seen 60 -= update: T2 writes P5 


CRASH, RESTART 
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Example: Crash During Restart! epten | QBure-c 
LSN LOG 
00,05 —— begin_checkpoint, end_checkpoint 
RAM 10 — update: T1 writes P5 
20 — update T2 writes P3 
Xact Table 30 — T1 abort 
40,45 — CLR: Undo T1 LSN 10, T1 End 
Dirty Page Table 50 —- update: T3 writes P1 
60 — update: T2 writes P5 
ane neaer CRASH, RESTART 
70 ~~ CLR: Undo T2 LSN 60 
ToUndo 
80,85 CLR: Undo T3 LSN 50, T3 end 
CRASH, RESTART 
90 CLR: Undo T2 LSN 20, T2 end 
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¢ What happens if system crashes during Analysis? During REDO? 
¢ How do you limit the amount of work in REDO? 

¢ Flush asynchronously in the background. 

¢ Watch “hot spots”! 
¢ How do you limit the amount of work in UNDO? 

¢ Avoid long-running Xacts. 
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° Recovery Manager guarantees Atomicity & Durability. 
¢ Use WAL to allow STEAL/NO-FORCE w/o sacrificing correctness. 


¢ LSNs identify log records; linked into backwards chains per 
transaction (via prevLSN). 


¢ pageLSN allows comparison of data page and log records. 
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° Checkpointing: A quick way to limit the amount of log to scan on 
recovery. 


¢ Recovery works in 3 phases: 
° Analysis: Forward from checkpoint. 
° Redo: Forward from oldest recLSN. 
¢ Undo: Backward from end to first LSN of oldest Xact alive at crash. 


¢ Upon Undo, write CLRs. 
¢ Redo “repeats history”: Simplifies the logic! 
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